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ABSTRACT 

Layout2,  written  in  0PS5,  is  a  production  system  that  models  a 
subset  of  high-level  visual  information  for  spatial  layout.  This 
information  is  modeled  as  a  system  of  foPA'ard-chaining  rules  that 
infer  properties  of  the  layout  of  the  environment  from  properties 
of  its  optical  projection  in  combination  with  constraints  on  the 
environment.  The      rules      are      deterministic      rather     than 

probabilistic;  apparent  conflicts  between  sets  of  rules  are 
resolved  by  adding  additional  information  that  refines  their 
application.  The  system  currently  includes  a  subset  of  visual 
information  arising  from  linear  perspective  and  ground-contact 
relations.  Layout!  shows  that  the  interaction  of  these  sources  of 
information,  in  appropriate  environments,  can  specify  the 
•complete,  environment-centered  representation  of  a  spatial  layout 
of  polygonal  surfaces.  It  is  suggested  that  such  analyses  of 
sources  of  visual  information  and  their  interaction  are  useful 
preparatory'  steps,  prior  to  the  study  of  algorithms  for  vision  and 
their  implementation. 


1.  INTRODUCTION 

1.1.  Ecological  optics 

Extensive  visual  capabilities  exist  in  a  wide  range  of  biological 
organisms.  More  limited  capabilities  have  been  created  in  a  variety  of 
computer  vision  systems,  with  expansions  of  these  capabilities  being  actively 
pursued.  There  is  considerable  interest  in  the  relations  between  biological 
and  computer  vision  (Beck,  Hope,  &  Rosenfeld,  1983).  The  attraction  of 
biological  vision  to  a  researcher  in  computer  vision  is  that  it  works  and  it 
does  so  under  demanding  real-time  constraints.  On  the  other  hand,  _  the 
attraction  of  computer  vision  to  a  researcher  in  biological  vision  is  that  it 
allows  hypotheses  about  how  vision  might  work  to  be  precisely  and  completely 
implemented  and  rigorously  tested. 

The  variety  of  existing  and  potential  visual  systems  is  enormous.  In 
particular,  there  are  fundamental  differences,  both  in  their  constituent 
elements  and  in  their  organization,  between  biological  and  computer  vision 
systems.  Nevertheless,  if  a  vision  system  is  to  perform  certain  functions, 
there  are  certain  problems  that  it  must  solve.  The  function  of  object 
recognition,  for  example,  must  deal  with  the  problem  that  a  three-dimensional 
solid  object  has  differing  two-dimensional  optical  projections  when  viewed 
from  differing  points  of  observation.  Functions  such  as  locomotion  and  object 
manipulation  that  require  the  system  to  interact  with  its  environment  must 
solve  the  problem  of  determining  the  three-dimensional  spatial  layout  of 
relevant  portions  of  the  environment  based  on  one  or  more  two-dimensional 
optical      projections.  The     dilTiculty     that     researchers      ha\'e      had     in 


understanding  biological  vision  systems  and  in  creating  computer  vision 
systems  is  due  in  large  part  to  the  difficulty  of  the  problems  that  such 
systems  must  solve. 

Problems  such  as  those  referred  to  above  would  be  completely  intractable 
vv'ere  it  not  for  the  existence  of  predictable  regularities  in  the  environments 
within  which  vision  systems  must  function.  Biological  vision  systems  are 
closely  adapted,  through  evolution  and  learning,  to  the  characteristics  of 
their  natural  environments.  Most  existing  computer  vision  systems  function 
within  rigorously  controlled  environments,  but  this  control  must  be  loosened 
if  the  range  of  functions  of  these  systems  is  to  be  extended.  A  broad 
underlying  problem  for  any  vision  system  is  how  to  make  use  of  reliable 
environmental  constraints  to  overcome  the  inherent  ambiguity  that  arises  in 
the  optical  projection  from  three  dimensions  onto  two. 

The  relations  that  map  combinations  of  environmental  constraints  and  two- 
dimensional  optical  projections  onto  three-dimensional  environments  constitute 
a  solution  space  within  which  any  vision  system  must  operate.  This  solution 
space  exists  independently  of  any  particular  vision  system  in  the  same  way 
that,  for  example,  aerodynamics  exists  independently  of  the  birds  and  aircraft 
whose  flight  depends  upon  their  successful  incorporation  of  aerodynamic 
principles. 

The  first  clear  formulation  of  a  discipline  concerned  with  the 
underlying,  environmentally-determined  possibilities  of  vision,  apart  from  the 
study  of  any  particular  vision  system,  seems  to  have  been  developed  by  J.  J. 
Gibson  (1961).  Gibson  referred  to  this  new  discipline  as  "ecological  optics," 
meaning  the  study  of  the  behavior  of  light  in  natural  environments.  Gibson's 
ecological  optics  incorporated  a  number  of  particular  analyses,  such  as  the 
relation  between  surface  slant  and  projected  gradients  of  texture,  the 
relation  between  surface  layout,  obser\'er  motion,  and  optical  flow  fields,  and 
the  relation  betv\een  the  depth  organization  of  surfaces  and  the  patterns  of 
dynamic  occlusion  that  result  from  observer  motion:  it  is  clear,  however,  that 
Gibson  saw  ecological  optics  primarily  as  an  approach— a  new  discipline  that 
would  continue  to  grow  and  evolve-rather  than  as  a  settled  body  of  research 
findings  (Gibson,  1961,  1966,  1979).  Ecological  optics  is  concerned  with 
analyzing  the  relations  between  environment  and  optical  projection  that  make 
vision  possible.  Such  analysis  provides  a  foundation  for  the  understanding  of 
existing  vision  systems  and  for  the  construction  of  new  ones.  Discovering  or 
devising  algorithms  for  vision  and  implementations  of  these  algorithms 
depends,  at  least  implicitly,  on  such  an  underlying  analysis. 

The  present  project  is  intended  as  a  contribution  to  ecological  optics. 
It  aims  to  further  the  understanding  of  the  possibilities  of  vision,  rather 
than  to  offer  a  model  of  any  particular  vision  system.  The  goals  of  the 
project,  however,  are  influenced  by  an  underlying  interest  in  human  vision  and 
in  the  design  of  computer  vision  systems. 


1.2.  Sources  of  visual  information  for  spatial  layout 

The  ambient  light  reaching  a  point  of  observation  is  referred  to  here  as 
the  optic  array.  The  optic  array  is  thought  of  as  a  unit  sphere  that 
surrounds  the  point  of  observation  and  onto  which  all  of  the  visible  features 
of  the  environment  are  projected,  with  the  point  of  observation  as  the  center 
of  projection  (Gibson,  1961,  1979;  this  concept  is  closely  related  to  the 
concept  of  the  Gaussian  sphere,  which  has  been  used  increasingly  in  recent 
years  by  computer  vision  researchers  [Barnard,  1983;  Magee  and  Aggarwal,  1984] 
to  represent  the  optical  projection). 

A  central  problem  for  vision,  as  was  indicated  above,  is  how  to  recover 
information  about  the  three-dimensional  spatial  layout  of  the  environment  from 
its  two-dimensional  projection.  Theoretical  analyses,  some  dating  back  as  far 
as  to  Descartes  (1637/1965)  have  revealed  multiple  possible  sources  of  such 
information,  and  experimental  investigations  have  shown  the  influence  of  many 
of  these  different  sources  on  human  vision.  Additional  analysis  is 
continually  adding  to  and  refining  the  understanding  of  these  different 
sources  of  information  (for  reviews  see  Barrow  &.  Tenenbaum,  1986;  Sedgv.'ick, 
1986). 

Any  analysis  of  a  source  of  visual  information  has,  broadly  speaking,  the 
same  form  implicit  in  it.  A  source  of  visual  information  can  be 
conceptualized  as  a  system  of  inference  rules:  If  certain  conditions  are  met, 
then  certain  conclusions  can  be  reached  about  the  environment.  The  conditions 
that  must  be  met  consist,  in  part,  of  characteristics  of  the  optic  array  and, 
in  part,  of  constraints  on  the  environment.  Each  different  source  of 
information  is  represented  by  its  own  set  of  rules,  embodying  its  own 
particular  constraints. 

There  are  several  limitations  in  an}'  single  source  of  visual  information. 
First  is  the  question  of  deciding  when  the  necessar)'  environmental  constraints 
are  satisfied;  if  there  is  no  other  informtion  on  which  to  base  this  decision, 
then  default  assumptions  must  be  made.  Second,  it  may  be  that  a  given  source 
of  information  deals  with  only  some  portions  of  the  optic  array.  Third,  the 
conclusions  that  can  be  drawn  may  only  delimit  some  range  of  environmental 
characteristics  without  uniquely  specifying  one. 

1.3.  Modeling  interacting  sources  of  visual  information 

In  human  vision  the  limitations  on  any  single  source  of  visual 
information  are  normally  overcome  by  combining  multiple  sources  of  visual 
information  to  obtain  a  robust,  unique,  veridical  correspondence  between  the 
optic  array  and  the  perceived  environment.  How  this  combining  process  works 
is  not  at  all  well  understood,  however,  nor  are  the  principles  underlying  such 
combination  of  information  at  all  clear.  The  most  popular  theoretical 
approach,  first  formulated  and  developed  by  Brunswik  (1956;  Brunswik  &  Kamiya, 
1953),  has  been  to  combine  different  sources  of  information  by  taking  a 
weighted  average  of  them;  Brunswik  determined  these  weights  by  assessing  the 


probabilistic  validity  of  each  of  the  different  sources  of  information,  but 
other  weighting  schemes  are  possible  (e.g.,  Wesley  &  Hanson,  1982).  Averaging 
schemes  have"  had  only  limited  success  when  applied  to  human  vision.  A 
significant  problem  for' this  approach,  as  Rock  (1977,  1983)  and  others  have 
sh'own,  is  that  the  ways  in  which  visual  information  is  used  and  combined  can 
be  heavily  dependent  on  complex  situational  variables. 

An  alternative  theoretical  approach  to  human  vision,  clearly  formulated 
by  Helmholtz  (1910/1962)  and  more  recently  adopted,  in  varying  forms,  by 
Hochberg  (1981),  Rock  (1977,  1983),  and  others,  has  been  to  regard  vision  as  a 
reasoning-like  process.  Confronted  with  complex  combinations  of  information, 
the  human  visual  system  often  responds  as  though  it  had  thought  through  the 
possibilities  and  decided  on  what  environment  would  most  reasonably  have  to 
exist  to  have  produced  this  particular  optic  array.  The  rational  model  of 
combining  information  is  supported  by  the  human  visual  system's  responsiveness 
to  situational  variables.  The  model  is  less  able,  however,  to  account  for  the 
many  situations  in  which  the  human  visual  system  seems  to  respond  quite 
mechanically,  perceiving  the  environment  in  ways  that  are  distinctly 
"unreasonable."  Also,  the  rational  model  has  so  far  remained  rather 
intuitive,  lacking  an  explicit  and  detailed  formulation  that  would  permit 
rigorous  testing. 

The  combination  of  different  sources  of  information  for  spatial  layout  is 
a  central  concern  of  the  present  project.  The  project  concentrates  on  the 
ecological  optics  of  multiple  sources  of  information  in  an  attempt  to  develop 
a  more  rigorous  foundation  on  which  future  analyses  of  human  and  computer 
vision  can  be  built.  Current  debates,  referred  to  above,  about  how 
information  is  combined  in  human  vision  may  remain  somewhat  premature  until  we 
reach  a  better  understanding  of  the  structure  of  the  underlying  visual 
information. 

The  approach  taken  here  is  rule-based  rather  than  probabilistic.  A  key 
idea  being  explored  in  the  present  project  is  that  conclusions  about  the 
environment  that  are  reached  by  one  set  of  rules  can  help  to  satisfy 
environmental  constraints  that  are  needed  for  some  other  set  of  rules.  In 
this  way  multiple  sources  of  information  may  interact  to  reach  more  powerful 
conclusions  than  are  available  to  any  single  source  in  isolation.  It  is 
postulated  that  the  number  of  default  assumptions  about  the  environment  that 
are  needed  to  establish  a  unique  correspondence  between  the  optic  array  and 
the  environment  may  decrease  as  the  number  of  different  sources  of 
information,  or  rules,  increases. 

Because  each  rule  in  this  conceptualization  has  some  conditions  that  are 
met  by  characteristics  of  the  optical  projection  of  the  environment,  the  ways 
that  rules  interact  are  sensitive  to  environmental  variables.  Sources  of 
visual  information  that  are  effective  in  one  situation  may  be  ineffective  in 
another. 

One  objective  of  the  present  project  has  been  to  define  the  inference 


rules  representing  a  subset  of  the  visual  information  for  spatial  layout  with 
enough  completeness  and  precision  so  that  their  interactions  can  be  explored 
in  detail.  The  goal  is  to  begin  the  development  of  a  theoretical  analysis  of 
the  integration  of  multiple  sources  of  information  into  a  single  system.  For 
practical  reasons  the  present  investigation  is  restricted  to  some  of  the 
visual  information  clustering  around  linear  perspective.  This  subset  of 
information  is  small  enough  to  be  managed  within  the  scope  of  the  present 
project  but  large  enough  to  offer  an  interesting  arena  within  which  to  explore 
the  interactions  discussed  above. 

1.4.  Environment-centered  perception  of  spatial  layout 

In  the  present  study,  the  environment  is  modeled  as  a  three-dimensional 
spatial  layout  of  connected  surfaces.  This  way  of  describing  the  environment 
was  first  advocated  and  systematically  pursued  by  J.  J.  Gibson  (1950),  and  has 
since  been  widely  adopted  in  studies  of  both  human  and  computer  vision  (Barrow 
&  Tanenbaum,  1978;  Kanada,  1980,  1981;  Marr,  1978,  1982). 

The  present  study  uses  what  I  refer  to  as  an  "environment-centered" 
representation  of  this  spatial  layout  of  surfaces.  This  means  that  the  size, 
orientation,  and  location  of  any  surface  are  described  relative  to  the  fixed 
framework  of  the  environment.  Such  a  representation  is  invariant  across 
changes  in  the  position  of  the  viewer.  Although  the  desirability  for  a  vision 
system  of  achieving  such  a  representation  seems,  at  least  implicitly,  to  be 
widely  accepted,  the  dominant  current  view  appears  to  be  that  of  Marr 
(1978,1982).  who  assumes  that  any  such  viev»er-independent  representation  must 
be  derived  from  a  prior  "viewer-centered"  representation  in  which  the 
orientatation  and  distance  of  ever>'  surface  are  described  relative  to  the 
viewer's  own  point  of  obser\'ation.  I  have  argued  elsewhere  (1983a;  Sedgwick  & 
Levy,  1985)  that  a  prior  viewer-centered  representation  is,  in  theory, 
unnecessary'  and  that  it  is  possible  to  obtain  an  environment-centered 
representation  of  spatial  layout  directly  from  the  optic  array.  A  central 
goal  of  the  present  study  is  to  provide  a  detailed  and  explicit  development  of 
this  argument  by  modeling  and  testing  a  subset  of  visual  information  for 
spatial  layout  that  could  support  such  a  representation. 

This  subset  of  visual  information  has  two  related  but  distinct 
components.  The  first  component  is  information  based  on  linear  perspective, 
which  makes  it  possible  to  obtain  the  environment-centered  orientations  of 
edges  and  surfaces  directly  from  the  optic  array.  By  the  "perspective 
structure  of  the  optic  array,"  I  mean  the  vanishing  points  of  all  visible 
straight  edges  and  the  horizon  lines  of  all  visible  planar  surfaces  in  the 
environment.  It  is  these  vanishing  points  and  horizon  lines  that  directly 
specify  environment-centered  orientations.  Except  in  the  case  of  unbounded 
edges  and  surfaces,  however,  vanishing  points  and  horizons  do  not  appear  as 
actual  projections  in  the  optic  array.  They  are  generally  only  implicit.  For 
example,  the  vanishing  point  of  a  set  of  parallel  edges  is  the  point  in  the 
optic  array  at  which  the  projections  of  the  edges,  if  extended,  would  all 
intersect.    Much  of  the  present  project  is  concerned  with  ways  of  determining 


the      implicit     perspective      structure      of     the      optic      array     for     various 
environments. 

The  second  component  is  information  based  on  the  contact  relations 
betu'een  surfaces  and  the  ground  plane.  In  a  terrestrial  environment,  most, 
objects  are  supported,  either  directly  or  indirectly,  by  an  extended  ground 
plane.  The  optical  projection  of  this  extended  plane  provides  a  visual 
connection  between  objects  that  can  be  used  to  establish  relative  scale  and 
location  independent  of  the  position  of  the  viewer.  The  use  of  contact  with 
the  ground  to  establish  scale  and  location  is  closely  tied  to  perspective 
information  in  two  ways.  First,  perspective  information  helps  to  determine 
whether  objects  are  in  contact  with  the  ground,  and  second,  the  terrestrial 
horizon  enters  into  the  relations  that  specify  scale  and  location. 

The  present  investigation  explores  the  interactions  by  which  visual 
information,  arising  from  regularities  in  the  environment  and  its  projection 
into  the  optic  array,  can  determine  the  perspective  structure  of  the  optic 
array  and  can  provide  a  complete  environment-centered  representation  of  the 
spatial  layout  of  the  environment.  (See  Sedgwick,  1983  a,  for  a  more  detailed 
development  and  analysis  of  the  concepts  presented  in  this  section.) 

1.5.  Background 

The  basic  approach  taken  here  stems  from  the  work  of  Gibson  (1961.  1979), 
as  do  many  of  the  particulars,  such  as  the  description  of  spatial  layout  as  an 
arrangement  of  visible  surfaces  and  the  idea  of  using  the  terrestrial  ground 
to  establish  scale  and  location  (Gibson,  1950). 

Concerning  perspective,  an  early  but  incomplete  analysis  of  the 
perspective  structure  of  the  optic  array  was  developed  by  Hay  (1974),  and  I 
have  extended  this  analysis  elsewhere  (Sedgwick,  1980,  1983a,  1986).  In  the 
stud)'  of  human  vision,  there  is  a  sizeable  quantity  of  empirical  research  on 
perspective,  mingled  with  a  certain  amount  of  analysis  (for  a  review,  see 
Sedgwick,  1986).  In  the  field  of  computer  vision,  most  analytical  work  in 
this  area  has  made  use  of  orthographic  rather  than  perspective  projection 
(e.g.,  Kanada,  1980,  1981).  In  recent  years,  however,  there  has  been  a  growing 
interest  in  perspective,  as  the  limitations  of  orthographic  projection, 
particularly  in  extended  scenes,  have  become  more  apparent  (Barnard,  1983, 
1985;  Kender,  1982,  1983;  Magee  &  Aggarwal,  1984). 

A  few  years  ago  I  created  a  small  pilot  production  system  in  0PS5  to 
assess  the  feasibility  and  potential  fruitfulness  of  the  approach  used  in  the 
present  investigation  (Sedgwick,  1983b).  Although  this  pilot  system,  referred 
to  as  Layout,  was  based  on  a  severely  limited  model  of  the  environment  and 
implemented  only  a  small  subset  of  perspective  information,  it  was  ver)'  useful 
in  clarifying  and  shaping  the  present  production  system,  which  is  consequently 
referred  to  as  Layout2.  This  report  is  based  on  a  more  detailed  description 
of  LaN0ut2  that  has  been  presented  elsewhere  (Sedgvvick.  1987). 


2.  OVERVIEW  OF  LAY0LT2 

Layout2  is  intended  as  another  step  toward  the  development  of  a 
production  system  that  embodies  visual  information  that  could  be  used  in  the 
perception  of  spatial  layout.  Layout2's  goals  are  considerably  more  ambitious 
than  those  of  the  first  version  of  Layout,  but  what  is  aimed  at  here  is  still 
only  a  small  subset  of  a  comprehensive  system. 

This  section  provides  an  overview  of  Layout2.  The  overview  outlines  a 
set  of  constraints  and  a  set  of  rules  that  together  define  a  coherent  class  of 
environments  and  the  associated  visual  perspective  information,  based  on 
regularities  of  these  environments,  that  should  allow  the  three-dimensional 
structures  of  some  of  these  environments  to  be  recovered  from  their 
projections  in  the  optic  array.  Although  both  the  class  of  environments  and 
the  subset  of  visual  information  defined  here  are  carefully  limited  in  scope, 
they  appear  to  contain  sufficient  complexity  to  afford  an  interesting  test  of 
the  production  system  approach  to  studying  the  interactions  that  different 
environments  can  produce  within  a  subset  of  high-level,  partially-redundant, 
visual  information. 

2.1.  Constraints 

2.1.1.    Environmental  constraints.    The  following  constraints  are  imposed 
on  the  class  of  environments  that  are  allowed  in  Layout2: 

1.  The  environment  is  described  as  a  spatial  layout  of  visible  extended 
surfaces. 

2.  These  surfaces  must  be  convex  polygons.  Thus  all  surfaces  are 
planar  and  have  straight  edges.  This  constraint  makes  surfaces  easier 
to  describe  and  fits  naturally  with  Layout2"s  emphasis  on  perspective 
information  carried  by  parallel  edges  and  surfaces.  This  constraint  is 
broad  enough  to  allow  a  wide  variety  of  surface  layouts  to  be  described. 

3.  Surfaces  may  share  edges.  This  allows  solid  objects  to  be 
described,  as  w^ell  as  various  origami-like  objects.  Shared  edges  must 
match  at  the  comers;  thus,  various  other  possible  forms  of  contact 
betv\een  surfaces  are  forbidden,  such  as  partially  overlapping  edges, 
surfaces  meeting  only  at  a  comer,  a  comer  touching  the  middle  of  an 
edge,  etc. 

4.  The  environments  modeled  are  terrestial,  which  is  taken  to  mean  that 
there  is  assumed  to  be  an  underlying,  infinitely  extended,  horizontal 
ground  plane  that  all  surfaces  either  rest  on,  are  supported  above,  or 
float  above.  This  ground  plane  provides  a  framework  for  determining  the 
relative  location  and  scale  of  all  surfaces  that  are  in  contact  with  it, 
either  directly  or  indirectly.  The  ground,  which  is  special  in  that  it 
has  no  edges,  is  the  onh'  surface  on  v,hich  other  surfaces  are  allowed  to 
rest. 


5.  The  environments  should  be  at  least  moderately  rich  in  parallel 
structures.  This  is  not  a  formal  constraint,  but  because  the 
perspective  information  used  by  Layout2  depends  heavily  on  the 
projections  of  parallel  edges  and  surfaces,  Layout2  will  make  little 
progress  in  finding  the  three-dimensional  structure  of  the  layout  unless 
the  arrangement  of  edges  and  surfaces  contains  a  fair  amount  of 
parallelism.  (As  is  noted  in  the  Conclusions  section  below,  this 
constraint  could  be  relaxed  somewhat  in  a  system  whose  representation  of 
the  environment  included  surface  texture,  because  the  optical  projection 
of  such  texture  also  contains  perspective  information.) 

2.1.2.     Optic  array  constraints.    The  following  constraints  are  imposed 
on  the  projections  from  the  environment  to  the  optic  array: 

1.  The  input  to  Layout2  consists  of  noise-free,  high-level  descriptions 
of  the  optic  array  at  a  particular  point  of  obser\'ation,  using  such 
geometrical  concepts  as  extended  lines  and  points  as  primitives. 

2.  The  complete  360  degree  sphere  of  the  optic  array  is  considered  to 
be  available  at  all  times  to  the  production  system.  This  is  consistent 
with  the  approach  taken  here  of  studying  available  visual  information 
rather  than  any  particular  visual  system.  Of  course  the  input  sensor  of 
many  visual  systems,  e.g.,  the  human  eye,  would  have  to  scan  the  optic 
array  over  time  in  order  to  register  the  available  visual  information. 

3.  The  position  of  the  point  of  observation  in  the  environment  must  be 
"typical",  i.e.,  must  have  a  non-zero  chance  of  occurring  naturally. 
This  constraint  prohibits  the  existence  of  optic  array  structures  that 
could  arise  only  because  the  point  of  observation  occupied  a  special 
position  in  which,  for  example,  fortuitous  alignments  of  points  or  lines 
in  the  optic  array  were  produced  by  projection. 

4.  Finally,  the  position  of  the  point  of  obsen-ation  must  be  such  that 
there  is  no  partial  occlusion  of  one  surface  by  another.  This 
constraint  operates  jointly  on  the  layout  of  surfaces  in  the  environment 
as  well  as  on  the  position  of  the  point  of  obser\'ation.  That  is, 
layouts  must  be  chosen  so  that  it  is  possible  to  have  some  points  of 
observation  without  partial  occlusion,  and  then  one  of  these  points  of 
obser\'ation  must  be  chosen. 

2.2.  Rules 

Visual  information  exists  only  because  of  reliable  structures  in  the 
environment.  The  constraints  listed  above  provide  for  such  structures.  On 
the  basis  of  these  constraints,  rules  can  be  formulated  characterizing  the 
relation  between  the  optic  array  and  the  environment.  These  rules  embody  the 
visual  information  that  is  available  in  the  optic  array.  The  following  groups 
of  rules    are    used   in    Layout2.      The   rules   are   stated    here    in    substantial!)' 


abbreviated  form;  the  detailed  set  of  conditions  and  conclusions  for  each  rule 
are  given  in  the  program  listing  of  Layout!  (Sedgwick,  1987). 

2.2.1.  Inverse  projection  rule.  The  constraint  forbidding  partial 
occlusion  leads  immediately  to  the  simplification  embodied  in  the  inverse 
projection  rule,  which  states  that  for  each  patch,  line,  or  point  in  the  optic 
array  there  corresponds  exactly  one  surface,  edge,  or  comer,  respectively,  in 
the  three-dimensional  environment.  Thus,  for  example,  a  six-sided  patch  in 
the  optic  array  must  be  the  projection  of  a  six-sided  surface  in  the 
environment.  This  rule  is  clearly  a  simplification,  adopted  for  convenience, 
that  would  have  to  be  relaxed  in  a  more  general  system. 

2.2.2.  Perspective  search  rules.  The  implict  perspective  structure  of 
the  optic  array  consists  of  the  vanishing  points  of  projected  edges  and  the 
horizons  of  projected  surfaces.  If  the  environment  is  sufficiently  rich  in 
parallel  structures,  then  its  projection  into  the  optic  array  will  contain 
sufficient  structure  to  specify  these  vanishing  points  and  horizons.  Layout! 
uses  a  set  of  rules,  referred  to  here  as  the  perspective  search  rules,  to  find 
perspective  structure  when  possible.  For  some  layouts  a  complex  combination 
of  rules  must  be  applied  before  the  complete  perspective  structure  is  found. 
Because  the  conditions  under  which  these  rules  operate  are  partially 
redundant,  in  many  situations  more  than  one  combination  of  rules  can  be 
applied  to  reach  the  same  result. 

The  perspective  search  rules  depend  in  part  on  the  geometry  of  projection 
and  in  part  on  the  assumption  of  implicit  perspective  structure  in  the  optic 
array.  It  is  assumed,  on  the  basis  of  the  third  optic  array  constraint  listed 
above,  that  the  otherwise  fortuitous  occurrences  of  alignments  and 
intersections  in  the  optic  array  must  be  indicators  of  an  actual  but  implicit 
structure  that  is  being  projected  from  actual  structures  in  the  environment. 
This  implicit  structure  is  taken  to  be  the  perspective  structure  in  the  optic 
array  that  arises  from  parallelism  between  edges  and  between  surfaces  in  the 
environment.  (In  a  system  that  included  visual  information  arising  from 
motion  of  the  point  of  obser\'ation,  it  normally  would  be  possible  to  test  the 
validity  of  this  assumption.) 

1.  The  convergence  rule  states  that  if  the  projections  of  three  edges 
converge  toward  (but  do  not  themselves  lie  on)  a  single  point  in  the 
optic  array,  then  that  point  is  the  vanishing  point  of  the  three  edges. 

2.  The  horizon  rule  states  that  when  any  two  distinct  vanishing  points 
of  edges  of  a  surface  are  known  then  the  horizon  of  the  surface  is  the 
optic  array  line  passing  through  those  two  vanishing  points. 

3.  The  coplanaritv  rule  states  that  if  an  edge  is  coplanar  with  a 
surface  then  the  vanishing  point  of  the  edge  lies  on  the  horizon  of  the 
surface. 


4.  The  co-horizon  rule  states  that  if  the  intersection  of  the 
projections  of  two  edges  lies  on  the  horizon  of  a  surface,  then  that 
intersection  is  the  vanishing  point  of  the  two  lines. 

5.  The  colinearity  rule  states  that  if  three  distinct  extrapolated 
intersections  of  projected  edges  are  colinear,  then  the  intersections 
are  the  vanishing  points  of  the  edges  and  they  lie  along  a  common 
horizon. 

6.  The  vanishing  point  rule  states  that  if  the  extrapolated  projection 
of  an  edge  passes  through  a  vanishing  point,  then  that  vanishing  point 
is  the  vanishing  point  of  the  edge. 

7.  The  contact  rule  states  that  if  an  edge  is  in  contact  with  the 
ground  plane  then  the  vanishing  point  of  the  edge  lies  on  the  horizon  of 
the  ground  plane. 

2.2.3.  Contact  search  rules.  Because  the  ground  plane  provides  the 
framework  of  three-dimensional  space  in  Layout2,  the  location  and  scale  of  a 
surface  can  only  be  determined  if  the  surface  can  be  brought  into  relation 
with  the  ground  plane.  This  is  done  either  by  finding  some  point(s)  where  the 
surface  itself  makes  contact  with  the  ground  plane  or  by  finding  such  points 
for  another  surface  that  in  turn,  either  directly  or  indirectly,  contacts  the 
surface  in  question. 

The  static,  monocular,  perspective  information  that  is  available  in 
Layout2,  however,  is  insufficient  to  ever  demonstrate  contact  with  the  ground 
plane  unless  an  additional  assumption  is  introduced.  This  is  referred  to  here 
as  the  optical  contact  assumption,  which  states  that  if  the  available 
information  is  such  that  some  point  could  be  in  contact  with  the  ground  plane 
then  it  v,ill  be  assumed  that  the  point  is  in  contact  with  the  ground  plane. 
(Additional  tests  of  the  validity  of  this  assumption  could  be  made  in  a  system 
including  more  visual  information,  such  as  that  arising  from  motion  of  the 
point  of  obserN'ation  or  from  cast  shadows.) 

It  is  the  perspective  structure  of  the  optic  array  that  determines 
whether  contact  with  the  ground  plane  is  possible.  In  Layout!,  the  term 
"contact"  is  used  somewhat  nonidiomatically;  to  say  that  a  surface  is  in 
contact  with  the  ground  means  that  the  entire  surface  is  resting  on  the 
ground;  to  say  that  an  edge  is  in  contact  with  the  ground  means  that  the  whole 
length  of  the  edge  is  resting  on  the  ground;  either  an  edge  or  a  surface  can 
have  a  single  comer  in  contact  with  the  ground,  but  in  this  case  the  edge  or 
surface  itself  is  not  said  to  contact  the  ground.  The  following  rules  are 
used  by  Layout!  to  determine  whether  the  perspective  structure  of  a  surface  is 
such  that  all  or  part  of  it  could  be  in  contact  with  the  ground. 

1.  If  the  projection  of  a  comer  is  above  the  horizon  of  the  ground 
plane,  then  the  comer  cannot  be  in  contact  with  the  ground;  if  not, 
then  it  micht  be. 
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2.  If  a  comer  is  at  the  upper  end  of  an  edge,  then  the  comer  cannot 
be  in  contact  with  the  ground;  if  not,  then  it  might  be. 

3.  If  both  comers  of  an  edge  contact  the  ground,  then  the  edge 
contacts  the  ground;  on  the  other  hand,  if  one  comer  of  an  edge  cannot 
contact  the  ground  then  the  edge  cannot  contact  the  ground. 

4.  If  two  edges  are  colinear,  then  their  ground  contact  status  must  be 
the  same. 

5.  If  two  edges  of  a  surface  contact  the  ground,  then  the  surface 
contacts  the  ground;  on  the  other  hand,  if  one  edge  of  a  surface  cannot 
contact  the  ground,  then  the  surface  cannot  contact  the  ground. 

6.  If  an  edge  is  parallel  to  the  ground,  then  both  of  its  comers  have 
the  same  ground  contact  status. 

7.  If  a  surface  is  parallel  to  the  ground,  then  all  of  its  comers  have 
the  same  ground  contact  status. 

2.2.4.  Three-dimensional  search  rules.  Layout!  uses  the  perspective 
structure  of  the  optic  array,  together  with  information  about  ground  contact, 
to  find  the  three-dimensional  properties  of  the  spatial  layout  of  surfaces  in 
the  environment.  For  edges  with  vanishing  points  and  surfaces  with  horizons, 
their  three-dimensional  angular  orientation,  relative  to  the  fixed  framework 
of  the  environment,  is  specified.  In  addition,  scale  and  location  are 
detemiined  for  any  surface  whose  relation  to  the  ground  plane  is  specified. 
Finally,  if  the  metric  height  above  the  ground  plane  of  the  point  of 
obser\'ation  is  specified,  then  metric  measures  of  size  and  location  can  be 
found. 

The  following  rules  briefly  summarize  the  main  links  between  optic  array 
structure  and  three-dimensional  structure  that  are  used  by  Layout2.  More 
detailed  discussion  of  these  rules  has  been  given  elsewhere  (Sedgwick  1980, 
1983a,  1986). 

1.  A  vanishing  point  implies  edge  orientation.  Any  edge  is  parallel  to 
the  vector  from  the  point  of  observation  to  the  edge's  vanishing  point. 
Hence,  this  vector  directly  specifies  the  orientation  of  the  edge. 

2.  Vanishing  points  imply  three-dimensional  angles.  The  three- 
dimensional  angle  between  two  edges  is  equal  to  the  3-D  angle,  subtended 
at  the  point  of  obser\'ation,  between  the  vanishing  points  of  the  two 
edges. 

3.  An  horizon  implies  surface  orientation.  Any  surface  is  parallel  to 
the  reference  surface  defined  by  its  horizon  and  the  point  of 
observation.        Hence,    this    reference    surface    directly    specifies    the 
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orientation  of  the  surface. 

4.  Direct  ground  contact  implies  scaled  ground  location.  If  it  is 
assumed  that  a  comer  is  directly  in  contact  with  the  ground  plane,  then 
the  position  of  the  comer's  projection  in  the  optic  array  implies  a 
unique  location  of  the  comer  on  the  horizontal  ground  plane.  This  ' 
location  can  be  specified  in  scaled  x-  and  z-coordinates  (the  y- 
coordinate  is  vertical),  where  the  scale  factor  is  the  (unknown)  height 
of  the  point  of  observation  above  the  ground  plane.  (Position  in 
Layout2  is  specified  by  the  environment-centered  concept  of  "location," 
in  which  an  object's  position  is  given  by  associating  it  with  a  fixed 
point  on  the  ground  plane;  this  location  is  invariant  with  movement  of 
the  point  of  observation.  The  use  of  a  coordinate  system  centered  on 
the  point  of  obser\'ation  to  assign  numerical  values  to  locations  is  only 
a  matter  of  convenience  in  the  implementation  of  Layout2  and  does  not 
"effect  the  underlying  environment-centered  representation  of  position.) 

5.  Indirect  ground  contact  implies  scaled  ground  location.  If  a  comer 
is  at  one  end  of  a  chain  of  connected  edges  whose  other  end  is  assumed 
to  contact  the  ground  plane,  then  the  perspective  stmcture  of  this 
chain  implies  a  unique  location  on  the  horizontal  ground  plane  that  is 
directly  below  the  comer  (i.e.,  is  a  vertical  projection  of  the  comer 
onto  the  ground  plane).  This  location  can  be  specified  in  scaled  x-  and 
z-coordinates,  as  above,  and  is  taken  to  be  the  ground  location  of  the 
comer. 

6.  The  horizon-ratio  relation  implies  scaled  height.  If  a  comer  has 
indirect  ground  contact,  then  the  scaled  height  of  the  comer  above  the 
ground  plane  is  given  by  the  horizon-ratio  relation,  which  states  that 
the  ratio  of  the  height  of  an  object  to  the  height  of  the  point  of 
obsep.'ation  is  equal  to  the  ratio  of  the  tangent  of  the  angular  height 
of  the  object  above  its  ground  plane  location  to  the  tangent  of  the 
angular  distance  from  its  ground  plane  location  to  the  horizon  of  the 
ground  plane. 

7.  Scaled  heights  and  orientation  imply  scaled  length.  The  scaled 
length  of  an  edge  is  determined  by  simple  trigonometric  relations 
between  the  orientation  of  the  edge  and  the  scaled  heights  above  the 
ground  of  its  two  comers. 

8.  One  metric  value  implies  all  metric  values.  If  the  height  of  the 
point  of  observation  is  unspecified,  then  locations  and  lengths  are 
specified  only  to  within  a  scale  factor.  If,  however,  any  one  metric 
value  in  the  environment  is  somehow  specified,  then  the  metric  height  of 
the  point  of  observation  is  determined  and  the  metric  values  of  all 
scaled  locations  and  lengths  in  the  layout  of  surfaces  are  also 
determined. 
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2.3.  Implementation  of  LayoutZ 

A  production  system  was  selected  as  the  appropriate  tool  for  carrying  out 
this  investigation.  Such  a  system  allows  a  set  of  inference  rules  to  be 
modeled  and  allows  their  data-directed  interactions  to  be  observed.  Layout! 
is  implemented  in  the  0PS5  production-system  language,  augmented  by  a 
collection  of  LISP  functions  for  computation.  A  detailed  description  of  the 
implementation  of  Layout2  is  presented  elsewhere  (Sedgwick,  1987). 

Layout!  begins  a  run  with  a  file  containing  environment-centered 
specifications  for  a  three-dimensional  layout  of  surfaces.  The  successive 
phases  of  the  production  system  first  project  this  three-dimensional  layout 
into  the  optic  array  and  then  attempt  to  recover  the  three-dimensional  layout 
from  this  optic  array  projection.  It  is  thus  possible  to  evaluate  Layout2's 
performance  with  any  given  spatial  layout  by  comparing  the  final 
representation  of  the  three-dimensional  layout  with  its  inital  specifications. 

When  Layout!  successfully  recovers  the  three-dimensional  layout  of  a  set 
of  surfaces,  the  final  representation  explicitly  includes  the  complete 
perspective  structure  of  the  layout  (i.e.,  the  vanishing  points  of  all  edges 
and  the  horizons  of  all  surfaces),  adjacency  relations  (i.e.,  which  comers, 
edges,  and  surfaces  are  resting  on  the  ground,  v,hich  edges  share  comers,  and 
which  surfaces  share  edges),  the  lengths  of  all  edges,  the  magnitude  of  all 
internal  angles  between  connected  edges,  the  orientations  of  all  edges  and 
surfaces  relative  to  the  framework  of  the  environment,  and  the  locations  of 
all  comers  relative  to  the  ground  plane  (i.e.,  each  comer's  height  above  the 
ground  and  coordinates  specifying  the  location  of  the  comer's  projection  onto 
the  ground  plane).  Layout2's  representation  of  spatial  layout  is  thus  biased 
more  toward  completeness  than  compactness,  with  the  intention  of  capturing 
some  of  the  richness  of  information  that  seems  to  be  present  in  human 
perception. 

Nevertheless,  much  additional  three-dimensional  information  is  left 
implicit  in  Layout2's  representation  of  surface  layout.  Most  information 
about  relations  between  different  entities  in  the  environment  is  only 
implicit.  This  includes  the  relative  lengths  of  edges,  the  angles  between 
surfaces  and  between  non-adjacent  edges,  the  paralTelness  of  edges  and  of 
surfaces,  and  the  distances  between  comers.  Some  adjacency  relations  are 
also  left  implicit;  for  example,  if  two  edges  share  the  same  comer  then  it  is 
implicit  that  the  edges  touch  at  the  comer.  Much  of  this  information  is  left 
implicit  because  it  could  be  obtained  easily  from  the  explicit  representation 
and  to  make  all  of  it  explict  would  greatly  expand  the  size  of  the 
representation. 
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3.  EVALUATION  OF  LAYOUT! 

In  what  follows,  a  number  of  test  cases  are  described  and  Layout2's 
performance  on  them  is  discussed.  The  purpose  of  this  is  to  obtain  some  sense 
of  Layout2's  capabilities  and  limitations  and  to  get  some  of  the  flavor  of  how 
Layout!  goes  about  determining  the  three-dimensional  characteristics  of 
different  surface  layouts.  The  broader  implications  of  this  evaluation  are 
discussed  in  the  Conclusions  section. 

Most  of  the  test  cases  consist  of  only  one  or  two  objects,  each  made  up 
of  several  surfaces.  The  limited  number  of  objects  and  surfaces  in  each  scene 
generally  does  not  reflect  a  limitation  in  Layout2  but  rather  arises  from  a 
practical  wish  to  limit  the  size  of  the  representation  and  the  length  of  the 
runtime  of  the  production  system  and  also  from  the  consideration  that 
extensions  in  the  number  of  similar  objects  are  generally  straightforu'ard  in 
Layout2  and  so  add  no  new  information  to  its  evaluation. 

The  test  cases  described  here  illustrate  some  of  Layout2's  successes. 
Layout2  is  data-driven  and  does  not  succeed  with  every  possible  layout  of 
surfaces.  When  Layout2  fails,  it  may  leave  certain  information  unspecified  in 
the  representation  or  may  specify  some  information  incorrectly.  Because 
Layout2  is  based  on  the  use  of  perspective  information,  its  success  depends  on 
the  environment  having  at  least  a  moderately  rich  set  of  parallel  structures. 
When  Layout2  fails,  this  is  an  indication  that  the  perspective  and  ground- 
contact  information  embodied  in  Layout2  is  insufficient  to  recover  the  three- 
dimensional  structure  of  that  particular  layout  of  surfaces. 

The  descriptions  given  here  are  both  sketchy  and  selective,  emphasizing 
only  those  points  concerning  each  test  case  that  are  of  particular  interest. 
Likewise,  the  figures  illustrating  the  test  cases  are  only  sketches,  not 
accurate  renderings.  A  number  of  the  test  cases  cannot  be  adequately  shown 
from  the  point  of  view  of  the  obser\'er  because  a  sketch  cannot  capture  a 
sufficiently  wide  field  of  view;  in  some  cases  the  test  case  is  shown  from 
another  point  of  obser\'ation  and  the  observer  is  included  in  the  scene  to 
indicate  the  location  of  the  point  of  obser\'ation.  Throughout  this  section 
reference  is  made  to  the  rules  named  and  described  in  the  Overview. 

3.1.  Resting  block. 
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The  simple  rectangular  block  (Figure  1),  represented  here  by  its  three 
visible  sides,  is  perhaps  the  simplest  structure  for  which  the  convergence 
rule  alone  is  able  to  determine  the  complete  vanishing  point  structure.  There 
are  three  sets  of  three  parallel  edges,  with  each  set  determining  one 
vanishing  point  (VP)  by  the  extrapolation  of  their  projections  to  the  point  at 
v.'hich  they  intersect.  (Note  that  in  the  planar  sketch  given  here  the 
projections  of  parallel  edges  that  are  in  the  frontal  plane,  such  as  the 
vertical  edges  here,  do  not  intersect;  special  cases  of  this  sort  do  not  arise 
in  the  optic  array.)  Each  pair  of  vanishing  points  then  determines  a  horizon 
by   the   horizon   rule,   thus  establishing  the  complete   perspective   structure   of 
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the  block. 


Figure  1     Resting  block 


In  this  case,  as  in  ever}'  other  case,  once  the  perspective  structure  has 
been  found  during  Layout2's  perspective  search  phase,  the  three-dimensional 
orientations  of  edges  and  surfaces  are  determined  during  the  three-dimensional 
search  phase  by  using  the  rules  that  a  vanishing  point  implies  edge 
orientation  and  an  horizon  implies  surface  orientation.  All  of  the  internal 
angles  between  adjacent  edges  are  determined  from  the  rule  that  vanishing 
points  determine  three-dimensional  angles. 

The  perspective  structure  of  the  block  is  also  used  by  Layout2  to  find 
the  lower  edges  of  the  block  and  to  determine  that  these  could  be  resting  on 
the  ground  plane.  It  is  then  assumed  that  these  edges  are  resting  on  the 
ground.  This  determination  of  ground  contact  allows  the  three-dimensional 
search  phase  to  find  the  location  on  the  ground  of  the  three  visible  comers 
resting  on  the  ground  and  to  determine  the  ground  location  of  the  four  visible 
comers  that  are  supported  above  the  ground;  these  latter  locations  are  at  the 
comers'  vertical  projections  onto  the  ground.  Note  that  the  ground  location 
of  the  upper  rear  comer  is  correctly  determined  even  though  it  has  no  direct 
visible  connection  with  the  ground.  With  ground  focations  correctly 
determined,  the  height  above  the  ground  of  all  comers  not  resting  on  the 
ground  is  determined  by  the  horizon-ratio  relation,  and  the  lengths  of  all 
edges  are  detemiined  from  their  three-dimensional  orientations  and  the  heights 
of  their  comers. 

Locations,  heights,  and  lengths  are  all  determined  only  to  within  a  scale 
factor  by  the  visual  information  in  the  scene.  Only  if  the  height  of  the 
point   of  observation   above   the   ground,   or   some   equivalent   information,   is 
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externally  provided  can  metric  determinations  of  these  values  be  made.  In  all 
of  the  test  cases,  for  the  purpose  of  allowing  the  direct  comparison  of  the 
three-dimensional  values  determined  by  Layout2  with  the  initial  input 
specifications  of  the  test  case,  the  correct  height  of  the  point  of 
observation  is  simply  inserted  into  the  system. 

3.2.  Tower. 

This  test  case  (Figure  2)  has  the  same  structure  as  the  previous  test 
case,  but  is  of  a  larger  scale,  being  several  times  higher  than  the  point  of 
observation.  This  change  produces  an  important  effect  on  the  perspective 
information.  Instead  of  being  entirely  below  the  ground  plane  horizon,  as  the 
block  was,  the  tower  is  partially  below  and  partially  above  the  ground 
horizon.  This  means  that  the  top  of  the  tower  is  no  longer  visible;  only  two 
sides  are  visible. 


to   vertical   VP 


Figure  2    Tower 


Having  only  tv\'0  visible  sides  results  in  a  reduction  in  the  amount  of 
perspective  information  that  is  available.  The  convergence  rule  requires 
three  parallel  edges  in  order  to  determine  a  vanishing  point;  thus  it  can 
operate  on  the  three  vertical  edges  to  determine  the  vertical  vanishing  point 
but  it  does  not  have  sufficient  information  to  find  the  vanishing  points  of 
the  sides. 

The  vanishing  points  of  the  sides  are  still  found,  however,  because  the 
tower's  base  and  roof  are  parallel  to  the  ground  plane.  Because  the 
extrapolated  projections  of  the  two  edges  of  each  side  have  intersections  that 
coincide  with  the  horizon  of  the  ground  plane,  these  intersections  are 
labeled  as  vanishing  points  by  the  co-horizon  rule.     In  all  other  respects  the 
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perspective   structure   and   the   three-dimensional   properties 
found  in  just  the  same  way  as  are  those  of  the  smaller  block. 

3.3.  Closed  room. 


of  the   tower 


are 


This  test  case  (Figure  3)  is  identical  in  structure  to  the  previous  two. 
The  difference  is  that  the  point  of  observation  is  now  located  inside  the 
structure.  This  results  in  all  six  sides  of  the  room  being  visible  and 
present  in  the  representation.  This  test  case  demonstrates  the  way  in  which 
the  360  degree  optic  array  represents  the  projection  of  the  entire 
environment,  from  ever>'  direction.  Visual  information,  such  as  that  studied 
by  Layout2,  is  simultaneously  present  in  ever}'  direction  even  though  most 
visual  systems  that  would  be  used  to  register  this  information  would  have  a 
more  limited  field  of  view  and  would  have  to  employ  sequential  scanning  to 
obtain  all  of  the  information  present  in  the  optic  array. 


to   vertical   VP 


VP 


Figure  3    Closed  room 


The  use  of  an  array-filling,  large-scale,  test  object  also  illustrates 
the  importance  of  perspective  information  in  a  way  that  smaller  scale  objects 
do  not.  With  small-scale  objects  that  take  up  only  a  small  ponion  of  the 
field  of  view,  there  is  little  difference  between  optic  array  projection  and 
projection  onto  a  flat  picture  plane,  and  there  is  also  little  difference 
between  true  perspective  projection  and  parallel  or  orthographic  projection. 
A  number  of  analyses  of  visual  information  for  spatial  layout  have  been  based 
on  parallel  rather  than  perspective  projection  because  the  former  is  easier  to 
work  with  and  it  is  assumed  that  there  is  no  effective  difference  between  the 
two  forms  of  projection  (e.g.,  Kanada,  1980,  1981).  Such  an  assumption  is  not 
valid  for  a  large-scale  environment,  howe\'er,  as  the  present  test  case  clearly 
demonstrates.       There    is    no    coherent    way    of   even    producing    a    parallel 
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projection  of  a  room  surrounding  the  point  of  observation. 

As  in  the  first  test  case,  the  convergence  rule  is  sufficient  to 
determine  the  vanishing  point  structure  of  the  room.  Because  the  room  has  six 
visible  sides,  however,  there  is  some  redundancy  in  determining  thi«- 
structure.  There  are  now  four  parallel  edges  in  each  of  the  three  sets,  but 
only  three  parallel  edges  are  necessary'  for  the  convergence  rule  to  find  a 
vanishing  point.  The  fourth  parallel  edge  in  each  set  is  handled  by  the 
vanishing  point  rule,  which  determines  that  the  extrapolation  of  the  edge's 
projection  passes  through  a  vanishing  point  which  has  already  been  determined 
and  so  assigns  that  vanishing  point  to  the  edge.  The  horizons  are  determined 
as  in  the  previous  test  cases,  as  are  the  three-dimensional  orientations  of 
edges  and  surfaces. 


•"c^ 


Ground  contact  and  all  that  follows  from  it  also  are  determined  in  the 
same  way  here  as  in  first  test  case.  The  determination  of  ground  contact  for 
the  room  points  out  a  simplification  in  Layout2.  Because  Layout2  has  no 
apparatus  for  dealing  with  occlusion,  the  horizon  of  the  ground  plane  simply 
is  treated  as  part  of  the  optic  array  even  though,  in  a  real  closed  room,  the 
walls  of  the  room  would  hide  the  horizon  of  the  ground  plane.  In  a  more 
sophisticated  representation  of  information  than  is  embodied  in  Layout2, 
approximate  information  concerning  the  location  of  the  terrestrial  horizon 
would  be  supplied  by  some  non-visual  source,  and  this  approximate  information 
would  then  be  used  to  identify  the  horizon  of  the  floor  as  being  close  to  the 
horizon  of  the  ground:  these  two  horizons  would  then  be  equated,  by  default, 
thus  providing  a  precise  visual  source  of  information  about  the  horizon  of  the 
ground  plane.  (In  humans  the  vestibuL'U"  system  acts  as  one  source  of 
approximate,  non-visual  information  for  the  location  of  the  terrestrial 
horizon.  A  default  identification  of  the  horizon  of  the  floor  of  the  room 
with  the  horizon  of  the  ground  plane  could  of  course  led  to  misperceptions  of 
orientation  if  the  floor  of  the  room  was  not  actually  horizontal.  There  is 
ample  evidence  of  such  misperceptions  in  human  vision.) 

3.4.  Twin  towers. 

The  two  towers  in  this  test  case  (Figure  4)  are  similarly  oriented  but, 
liice  the  twin  towers  of  the  Worid  Trade  Center,  are  diagonally  offset  relative 
to  one  another.  The  point  of  observation  is  located  between  them. 

The  only  novel  feature  of  this  test  case  is  that  there  are  two  towers. 
Because  perspective  structure  is  invariant  across  changes  in  position,  the 
perspective  structures  of  the  two  towers  are  identical;  they  share  the  same 
set  of  three  vanishing  points  and  three  horizons.  The  search  procedures  of 
Layout2  make  use  of  this  redundancy,  finding  the  perspective  structure  of  the 
second  tower  quite  rapidly  after  the  perspective  structure  of  the  first  tower 
has  been  determined.  More  generally,  this  test  case  points  out  that  an 
environment  with  many  objects  in  it  may  nevertheless  have  a  very  simple 
perspective  structure  if  the  objects  have  many  parallel  sides.  Only  a  new 
orientation  leads  to  another  vanishing  point  or  horizon  in  the  optic  array. 
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Figure  4     Twin  towers 


Note  that  if  one  of  the  towers  were  turned  relative  to  the  other  tower 
then  the  two  towers  would  have  somewhat  differing  perspective  structures; 
there  would  be  five  vanishing  points  and  five  horizons,  with  both  towers  still 
sharing  the  vertical  vanishing  point  and  the  ground  horizon.  The  relational 
properties  of  the  perspective  structures  of  the  two  towers  would  be  identical, 
but  one  structure  simply  would  be  displaced  relative  to  the  other  in  the 
array.  A  more  sophisticated  representation  of  visual  information  than  is 
contained  in  Layout!  would  recognize  and  make  use  of  the  identical  "shapes"  of 
the  perspective  structures  of  objects  having  similar  three-dimensional  shapes 
but  different  orientations. 

3.5,  House. 

The  simple  house  (Figure  5)  is  a  somewhat  more  complex  case  than  those 
considered  thus  far  and  ser%'es  to  illustrate  the  interaction  of  a  variety  of 
rules  in  determining  the  perspective  structure  of  the  optic  array. 

The  vanishing  points  of  the  three  visible  vertical  edges  of  the  house  and 
of  its  three  visible  parallel  horizontal  edges  are  determined  by  the 
convergence  rule.  This  leaves  three  sloping  edges  and  one  horizontal  edge  for 
which  no  perspective  structure  is  determined.  From  the  vanishing  points  of 
the  vertical  edges,  the  contact  search  can  determine  that  although  the  three 
sloping  edges  cannot  be  resing  on  the  ground,  the  horizontal  edge  may  be. 
Once  that  edge  is  assumed  by  default  to  have  ground  contact,  the  contact  rule 
determines  its  vanishing  point  by  finding  the  intersection  of  its  extrapolated 
projection  with  the  terrestrial  horizon.  The  front  surface  of  the  house  no\\' 
has    two    vanishing   points    associated    with    it,    and    so    its    horizon    can    be 
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determined  by  the  horizon  rule.  The  intersection  of  the  extrapolated 
projections  of  the  two  parallel  sloping  edges  lies  on  this  horizon  and  so  is 
labeled  as  their  vanishing  point  by  the  co-horizon  rule.  The  vanishing  point 
of  the  remaining  sloping  edge  is  determined  by  the  coplanarity  rule.  Thus,  by 
a  rather  complex  interaction  of  the  rules  for  finding  perspective  structure 
and  for  determining  ground  contact,  the  complete  perspective  structure  of  the 
house  and  its  relation  to  the  ground  are  determined. 


to   vertical   VP 


Figure  5    House 


3.6.  Cantilevered  arch. 

This  test  case  (Figure  6)  illustrates  the  application  of  Layout2  to  a 
simple  origami-like  object.  The  connected  surfaces  here  do  not  form  a  solid 
object,  but"  Layout2  determines  their  perspective  structure  in  the  same  way  as 
with  a  solid  object. 

All  of  the  surfaces  contribute  parallel  horizontal  edges  having  a  common 
vanishing  point  that  is  determined  by  the  convergence  rule.  The  intersections 
of  the  projections  of  each  of  the  other  pairs  of  parallel  edges,  one  from  each 
surface,  lie  along  a  single  line  that  is  the  horizon  of  the  surfaces  that 
would  be  the  sides  of  the  arch  if  it  had  sides.  This  configuration  invokes 
the  colinearity  rule,  which  determines  all  of  these  vanishing  points  as  well 
as  their  horizon.  Thus  the  perspective  structure  of  the  arch  is  completely 
determined. 

From  this  perspective  structure  Layout2  detemiines  that  the  lower  edge  of 
the  arch  can  be,  and  by  default  is.  in  contact  with  the  ground.  The  three- 
dimensional  locations  of  all  comers  and  lengths  of  all  edges  are  then 
determined.     Note  that  information  about  location  travels  up  the  edges  from 
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the  corners  that  directly  contact  the  ground  to  those  whose  contact  with  the 
ground  is  only  indirect. 


to  vertical   VP  I  I 

Figure  6     Cantilevered  Arch 

3.7.  Skewed  hexagonal  pavement. 

This  test  case  (Figure  7)  illustrates  the  presence  of  perspective 
structure  in  objects  devoid  of  right  angles.  A  pavement  is  composed  of  skewed 
hexagons,  each  having  three  pairs  of  parallel  sides.     With  hexagons  parallel 


VP 


Terrestrial    Horizon 


to  VP 


Figure  7    Skewed  hexagonal  pavement 
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to  (and  resting  on)  the  ground  plane  as  these  are,  one  hexagon  is  sufficienf 
to  satisfy  the  co-horizon  rule  and  thus  resolve  the  three-dimensional 
structure.  If  the  hexagons  were  slanted  to  the  ground  plane,  then  two  would 
be  necessar}'  to  invoke  the  convergence  rule  and  resolve  the  structure.  Any 
additional  hexagons  are  redundant  and  are  simply  assimilated,  by  the  vanishing 
point  rule,  to  the  perspective  structure  that  is  already  determined. 

3.8.  Irregular  slab. 

This  irregular  slab  (Figure  8)  illustrates  how  an  object  with  hidden  or 
irregular  structure  can  nevertheless  be  resolved  by  Layout!  if  enough  of  its 
structure  is  visible. 


to  vertical   VP 

Figure  8     Irregular  slab 


The  three  vertical  edges  determine  one  vanishing  point  by  the  convergence 
rule.  The  two  pairs  of  parallel  edges  of  the  two  front  surfaces  have 
projections  intersecting  on  the  terrestrial  horizon  and  so  determine  their 
vanishing  points  by  the  co-horizon  rule.  The  horizon  of  the  top  surface  is 
then  determined  by  the  horizon  rule.  Once  the  horizon  of  this  surface  has 
been  found,  the  vanishing  points  of  the  three  rear  edges  are  each  determined 
by  the  coplanarity  rule. 

Once  the  perspective  structure  has  been  found,  the  three-dimensional 
properties  are  determined  similarly  to  the  earlier  test  cases.  Note  again  (as 
in  the  first  test  case)  that  the  three  rear  comers  of  the  top  surface  have 
their  height  above  the  ground  plane  correctly  determined  and  have  the 
locations  of  their  vertical  projections  onto  the  ground  plane  correctly 
determined  in  spite  of  their  having  only  indirect  contact  with  the  ground 
plane.    Location    information    travels    along    the   edges,    starting    with    comers 
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that  are  contacting  the  ground,  until  this  information  reaches  the  backmost 
upper  comer. 

4.  CONCLUSIONS 

4.1.  Present  results 

On  the  basis  of  the  evaluation  of  Layout2  given  in  the  preceeding 
section,  several  conclusions  can  now  be  drawn  concering  the  investigation  of 
visual  information  attempted  in  this  project. 

4.1.1.  The  perspective  structure  of  the  optic  array  in  a  terrestrial 
environment  can,  under  certain  constraints,  be  completely  determined  and  can, 
within  the  confines  of  the  present  representation,  completely  specify  the 
spatial  layout  of  the  environment.  The  actual  examples  tested  support  this 
conclusion  and  also  clearly  show  that  much  more  complicated  examples  could  be 
handled  without  any  alteration  of  Layout2.  This  conclusion  concerning  the 
visual  information  for  spatial  layout  already  has  been  supported  on  the  basis 
of  mathematical  analysis  prior  to  the  implementation  and  testing  of  Layout2 
(Sedgwick,  1983  a),  but  the  production  system  developed  in  this  project 
further  supports  this  claim.  It  does  so  both  by  incorporating  a  much  more 
detailed  analysis  of  the  visual  information  available  from  perspective  within 
a  terrestrial  setting  than  has  been  given  previously  and  also  by  rigorously 
testing  this  analysis  by  applying  it,  through  the  operation  of  the  production 
system,  to  a  variety  of  simulated  environments. 

Although  the  environments  simulated  and  tested  here  were  highly 
simplified,  some  of  these  simplifications  were  superficial  matters  of 
convenience,  and  the  need  for  some  others  could  be  eliminated  by  additions  to 
Layout2;  some  of  these  additions  were  mentioned  in  the  discussion  of  the  test 
cases,  and  others  are  discussed  belov.'.  As  one  example  of  the  way  in  which  the 
simplifications  of  Layout2  could  be  reduced,  it  could  be  noted  that  the 
dependence  of  perspective  on  parallel  edges  can  be  reduced  or  eliminated  by 
introducing  textured  surfaces.  As  I  have  shown  elsewhere  (Sedgwick,  1983  a), 
perspective  structures  can  be  determined  from  the  projection  of  statistically 
regular  textures.  Clearly,  however,  in  order  to  apply  the  visual  information 
modeled  here  to  systems  perceiving  real  environments,  this  information  would 
have  to  be  incorporated  into  a  system  that  could  deal  with  many  important 
problems  not  addressed  here,  such  as  the  detection  of  edges  and  the 
suppression  of  noise. 

4.1.2.  A  reasonably  complete  environment-centered  representation  of 
spatial  layout  has  been  developed,  and  it  has  been  shown  that  visual 
information  available  in  the  optic  array  can  specify  the  properties  of  such  a 
representation  directly,  without  any  need  to  use  an  intermediate,  viewer- 
centered  representation,  such  as  that  postulated  by  Marr  (1978,  1982).  The 
specification  of  the  environment-centered  representation  has  been  shown  to 
flow  naturally  and  simply  from  the  structure  of  the  optic  array.  To  introduce 
a  viewer-centered  representation  as  an  intermediate  step  in  this  process  would 
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seem    likely    to    complicate,    rather    than    simplify,    the    analysis    of    visual 
information  carried  out  by  Layout2. 

4.1.3.  A  start  has  been  made  on  investigating  how  multiple  sources  of 
information  can  interact  to  overcome  ambiguities  present  within  the  individual 
sources.  Layout2  permits  such  an  investigation  only  within  a  rather  narrow 
domain,  comprising  the  rules  for  finding  perspective  structure  and  ground 
contact  relations  in  different  projective  and  environmental  situations. 
Nevertheless,  the  interactions  that  occur  here,  as  demonstrated  in  some  of  the 
test  cases,  are  sufficiently  complex  to  be  of  some  interest.  The  most 
interesting  cases,  from  this  point  of  view,  are  those  that  at  first 
consideration  might  seem  not  to  have  enough  richness  of  projective  structure 
to  determine  their  perspe<;tive  structure.  Several  cases  of  this  sort,  where  a 
somewhat  subtle  and  complex  combination  of  rules  was  found  by  Layout2  to  be 
sufficient  to  determine  the  perspective  structure  and  hence  the  three- 
dimensional  properties  of  the  scenes,  clearly  illustrate  the  increased  power 
that  comes  from  combining  sources  of  information. 

The  rather  rich  structures  and  interactions  that  were  uncovered  in  some 
of  the  test  cases  suggest  that  the  rule-based  analysis  of  visual  information 
may  well  be  a  productive  one.  It  seems  unlikely  that  such  complex  underlying 
structures  would  be  revealed  by  approaches  to  combining  different  sources  of 
information  that  are  based  on  taking  v^eighted  averages. 

It  perhaps  needs  to  be  reemphasized  here  that  vvhat  has  been  studied  in 
this  project  is  vision  theor}'-the  relations  between  environment  and 
projection  that  determine  the  possibilities  of  vision— not  vision  systems.  No 
claim  is  being  made  here  that  Layout2  is  a  useful  or  realistic  model  of  a 
vision  system,  or  e\en  of  a  small  part  of  a  vision  system.  Layout2  is  being 
used  to  explore  the  solution  space  within  which  any  vision  system  must 
operate;  the  underlying  assumption  in  this  endeavor  is  that  the  better  this 
solution  space  is  understood  the  greater  the  possibility  will  be  of 
understanding  or  creating  actual  working  vision  systems. 

4.1.4.  The  final  conclusion  is  that  the  production  system  has  been  found 
to  be  a  useful  tool  for  investigating  rule-based  descriptions  of  visual 
information.  This  usefulness  appeared  in  several  ways.  At  a  more  formal 
level,  the  production  system  offers  a  formalism  for  precisely  defining  and 
rigorously  testing  sources  of  information  that  had  been  less  precisely  defined 
already.  The  application  of  Layout2  to  the  test  cases  provides  unambiguous 
demonstration  of  the  conclusions  stated  earlier  in  this  section. 

Although  not  so  apparent  from  this  report,  the  heuristic  value  of  the 
production  system  approach  has  been  very  apparent  while  carr>'ing  out  this 
project.  Much  of  what  is  formalized  here  was  discovered  in  response  to  the 
demands  of  the  formalization.  Many  of  the  rules  now  incorporated  in  La}'out2 
were  de\eloped  in  response  to  difficulties  raised  b)'  particular  test  cases  for 
earlier  versions  of  Layout2.  From  the  point  of  view  of  the  investigator,  more 
has  been  learned  so  far  from  the  process  of  developing  the  production  system 
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than  from  runing  it  once  it  was  completed. 

This  last  remark  also  reflects,  at  least  in  part,  the  limited  number  of 
test  cases  that  have  been  run  and  analyzed  thus  far.  Enough  examples  have 
been  developed  to  demonstrate  the  basic  properties  of  the  production  system. 
as  described  in  the  Evaluation  section,  and  to  support  the  conclusions  offered " 
in  the  present  section,  but  this  still  constitutes  only  a  very  limited 
exploration  of  Layout!' s  responses  to  various  environments.  How  to  use  this 
new  tool  systematically  and  to  greatest  advantage  is  still  a  question  open  for 
exploration. 

4.2.  Future  prospects 

The  experience  with  Layout2  suggests  a  number  of  modifications  and 
extensions  that  could  be  carried  out  in  further  versions.  Some  of  the  more 
substantive  of  these  are  mentioned  briefly  here. 

4.2.1.  The  only  default  contact  relations  represented  in  Layout2  are 
those  with  the  ground  plane.  A  fairly  simple  extension  would  allow  such 
contact  relations  to  be  represented  between  all  objects.  This  would  greatly 
extend  the  range  of  environments  that  could  be  represented.  For  example,  the 
relations  of  doors,  windows,  and  other  architectural  features  to  the  walls  and 
roof  of  a  building,  the  use  of  other  surfaces  of  support  than  the  ground, 
such  as  a  book  or  box  resting  on  a  table,  and  the  relations  of  surface  pattern 
features  to  a  surface,  could  all  be  captured  by  the  perspective  analysis  of 
contact  relations  with  only  minor  extensions  to  Layout2. 

4.2.2.  One  of  the  most  severe  current  restriction  on  the  range  of 
environments  that  can  be  represented  in  Layout2  is  the  prohibition  of 
occlusion.  To  represent  and  analyze  occlusion  would  require  a  significant 
reworking  and  extension  of  Layout2,  but  it  is  clearly  a  next  step  to  take  in 
developing  the  analysis  begun  here.  It  appears  that  the  t\pe  of  analysis 
carried  out  in  Layout2  might  offer  a  powerful  approach  to  occlusion  relations, 
although  explorations  in  this  direction  have  not  yet  been  carried  ver}'  far. 

4.2.3.  Observation  of  human  pictorial  perception  makes  it  clear  that 
there  is  much  more  information  available,  even  in  such  simple  pictorial 
displays  as  those  used  here,  than  has  yet  been  incorporated  into  Layout2.  It 
would  be  of  interest  to  analyze  and  include  the  special  status  of  right 
angles,  the  effects  of  point  of  observation  on  what  is  visible,  the 
topological  relations  that  exist  within  a  closed  object,  the  effect  of 
perspective  compression  (foreshortening)  as  well  as  additional  linear 
perspective  relations,  and  other  such  sources  of  information.  An  attempt  to 
add  these  more  heterogeneous  sources  of  information  would  particularly 
challenge  the  rule-based  approach  to  the  combination  of  multiple  sources  of 
information.  Such  an  attempt  would  also  encourage  the  addition  of  more  non- 
metric  forms  of  information  to  the  representation  of  the  three-dimensional 
environment. 
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4.2.4.  As  has  been  mentioned  earlier,  although  Layout!  employs  a  rather 
rich  representation  of  the  environment,  much  information,  often  of  a 
relational  nature,  is  still  left  implicit.  As  a  way  of  making  this 
information  accessible,  a  series  of  "quer)'"  productions  might  be  added  to 
Layout2.  These  quer>'  productions  would  use  the  information  implict  in  the 
representation  to  provide  explicit  answers  to  questions  about  the  environmeat. 
For  example,  such  productions  might  answer  such  questions  as  "what  is  the 
distance  between  A  and  B?",  "what  is  the  height  of  A  relative  to  B?",  "what 
edges  are  parallel  to  A?",  etc.  Adding  such  query  productions  to  Layout2 
should  be  quite  straightforward. 

4.2.5.  Finally,  and  more  distantly,  if  investigation  along  these  lines 
continues  to  seem  fruitful,  the  production  system  could  be  expanded  to  include 
other  broad  classes  of  visual  information,  such  as  information  provided  by 
projected  gradients  of  texture,  by  binocular  disparity,  and  by  the  motion  of 
objects  and  of  the  point  of  observation. 
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